-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an API to fetch repositories and list their branches. #12
Conversation
f0d9b3a
to
9da5588
Compare
d601784
to
fa82fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a couple of comments but on the whole looks good!
#' is raised. | ||
#' @return the path to the repository. | ||
repository_path <- function(base, url, check = TRUE) { | ||
hash <- openssl::sha1(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont mind this, but also makes it less easy for us to poke around, you said its to help with weird characters and stuff in the url, whats a character that would break it that github allows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I do get the less pokeable argument and was a bit worried about it. It is still possible to get the URL from a cloned repo though by doing git remote get-url origin
, although of course it's not as obvious at a glance than just looking at directory names.
The "weird character" comes in large part from the fact that I did not want to assume anything about GitHub here, so we need to encode the full URL, not just org name and repo name. That include :
and @
, eg. [email protected]:org/repo
(technically Linux allows any characters but NUL and /
in filenames, but not really a game I want to play). We would need to deal with slashes (creat nested directories?), including leading slashes if the URL is actually a file path (which the tests do a lot). You also have to consider what the effect of .
and ..
in the URL has on the filesystem (should http://foo/bar/..
and http://foo
map to the same repository in the cache?).
I suppose the alternative is to do a percent encoding of the URL, which would work to avoid all the special characters (but it would do nothing about .
and ..
), but you still have a problem if the URL is longer than 256 characters, which is the limit for filenames on Linux. Not a likely scenario, but a possible one.
Using a hash of the URL was a very easy to workaround all these problems by guaranteeing a fixed length, unique name using only safe characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay i was just thinking a very simple directory structure like org/repo
wasnt thinking about the whole url although is there any reason we even want the whole url? and github repo are pretty clean, they only allow the following:
and also .
and ..
are reserved names so i think thats chill too
but yh youre right about it being us relying on github
i do think in practice however, none of these issues will ever affect us with the org/repo
directory structure but the hash with the whole url doesnt care about where the repo actually is so more general i guess
happy to leave it to your judgement!
if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason we even want the whole url?
Just being able to use a file path as URL in unit tests is a pretty good use case, instead of having to mock GitHub or have some tedious fake setup
if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug
Sure can do
branches = data.frame( | ||
name = branches$name, | ||
commit_hash = branches$commit, | ||
time = as.numeric(branches$updated), | ||
message = message, | ||
row.names = NULL | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a suggestion, just curious as to why you used data.frame
instead of list
, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?
also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a suggestion, just curious as to why you used data.frame instead of list, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?
Err, I don't know. It just works. I don't even know how what the easy way to turn this into a list would be.
If I just replace data.frame
by list
and do no other transformation then the result is radically different. It would be one object with fields name
, commit_hash
, ... where each entry is an array of strings, which is not idiomatic JSON at all. Using data.frame
gives the expected result because jsonlite has special support for it. Orderly2 does this all the time when serializing, eg. the list of files in a packet.
The other way would be to do the pivot myself and generate a list of lists, where each nested list contains a bunch of scalars. Doable but way more tedious and verbose.
also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that
I never understood this concern. JSON can serialize newline characters just fine. But yeah you are right this breaks with the outpack_server API. I can go back to using a list of strings if you prefer, but it makes no sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yh i mean feel free to include newline characters, we can always fix it pretty easily if needed, do you think its worth having the type as an array of 1 though? just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays, idm either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays
This doesn't show up in the database does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not, im not 100% sure but it shows up in the data class for the response, should be easy enough to change but might wanna make a quick ticket for it so we dont forget!
This API is a replacement for the one currently provided by outpack_server. A single instance of orderly.runner API is able to operate over an arbitrary number of Git repositories. The first time a URL is fetched, a `git clone` is performed. Subsequent fetches are incremental using `git fetch`. Repositories are each stored in their own directory, named using the hash of the URL. Using the hash makes it easy to support arbitrary upstreams, without having to worry about encoding the URL characters as file names. The existing endpoints still operate on the "root" repository, which is assumed to exist before the API is launched. Eventually all those APIs will be migrated to have a `url` argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve
@@ -1,6 +1,6 @@ | |||
parse_main <- function(args = commandArgs(TRUE)) { | |||
usage <- "Usage: | |||
orderly.runner.server [options] <path> | |||
orderly.runner.server [options] <path> <repositories> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too right?
orderly.runner.server [options] <path> <repositories> | |
orderly.runner.server [options] <path> <repositories-base-path> |
This API is a replacement for the one currently provided by outpack_server.
A single instance of orderly.runner API is able to operate over an arbitrary number of Git repositories. The first time a URL is fetched, a
git clone
is performed. Subsequent fetches are incremental usinggit fetch
.Repositories are each stored in their own directory, named using the hash of the URL. Using the hash makes it easy to support arbitrary upstreams, without having to worry about encoding the URL characters as file names.
The existing endpoints still operate on the "root" repository, which is assumed to exist before the API is launched. Eventually all those APIs will be migrated to have a
url
argument.